BigTable Characteristics

This lesson will explore some miscellaneous characteristics of BigTable.

We'll cover the following

BigTable performance
Dynamo vs. BigTable
Datastores developed on the principles of BigTable

BigTable performance#

Here are a few reasons behind BigTable’s performance and popularity:

Distributed multi-level map: BigTable can run on a large number of machines.
Scalable means that BigTable can be easily scaled horizontally by adding more nodes to the cluster without any performance impact. No manual intervention or rebalancing is required. BigTable achieves linear scalability and proven fault tolerance on commodity hardware.
Fault-tolerant and reliable: Since data is replicated to multiple nodes, fault tolerance is pretty high.
Durable: BigTable stores data permanently.
Centralized: BigTable adopts a single-master approach to maintain data consistency and a centralized view of the state of the system.
Separation between control and data: BigTable maintains a strict separation between control and data flow. Clients talk to the Master for all metadata operations, whereas all data access happens directly between the Clients and the Tablet servers.

Dynamo vs. BigTable#

Here is the comparison between Dynamo and BigTable:

	Dynamo	BigTable
Architecture	Decentralized Every node has same set of responsibilities	Centralized Master handles metadata, tablet servers handle read/write
Data Model	Key-value	Multidimensional sorted map
Security	X	Access rights at column family level
Partitioning	Consistent Hashing Each node is assigned to a random position on the ring	Tablets Each table is broken into a contiguous range of rows called tablets
Replication	Sloppy Quorum Each data item is replicated to 'N' number of nodes	GFS Chunk replication Data is stored in GFS. Files in GFS are broken into chunks, and these chunks are replicated to different servers
CAP	AP	CP
Operations	By key	By key-range
Storage	Plug-in	SSTables in GFS
Memberships and failure detection	Gossip based protocol	Handshakes initiated by the master

Datastores developed on the principles of BigTable#

Google’s BigTable has inspired many NoSQL systems. Here is a list of a few famous ones:

HBase: HBase is an open-source, distributed non-relational database modeled after BigTable. It is built on top of the Hadoop Distributed File System (HDFS).

Hypertable: Similar to HBase, Hypertable is an open-source implementation of BigTable and is written in C++. Unlike BigTable, which uses only one storage layer (i.e., GFS), Hypertable is capable of running on top of any file system (e.g., HDFS, GlusterFS, or the CloudStore). To achieve this, the system has abstracted the interface to the file system by sending all data requests through a Distributed File System broker process.

Cassandra: Cassandra is a distributed, decentralized, and highly available NoSQL database. Its architecture is based on Dynamo and BigTable. Cassandra can be described as a BigTable-like datastore running on a Dynamo-like infrastructure. Cassandra is also a wide-column store and utilizes the storage model of BigTable, i.e., SSTables and MemTables.

BigTable Refinements

Summary: BigTable

Mark as Completed

Report an Issue

Introduction

Summarized System Design Problems

Dynamo: How To Design a Key-Value Store

Cassandra: How to Design a Wide-Column NoSQL Database

Kafka: How to Design a Distributed Messaging System

Chubby: How to Design a Distributed Locking Service

GFS: How to Design a Distributed File System

HDFS: How to Design a Distributed File System

BigTable: How to Design a Wide-Column Storage System

System Design Patterns

Glossary of System Design Basics

Final Assessments

BigTable Characteristics

BigTable performance#

Dynamo vs. BigTable#

Datastores developed on the principles of BigTable#